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Abstract 



A collaborative filtering system recommends to users products that similar users like. Col- 
laborative filtering systems influence purchase decisions, and hence have become targets of 
manipulation by unscrupulous vendors. We provide theoretical and empirical results demon- 
strating that while common nearest neighbor algorithms, which are widely used in commercial 
systems, can be highly susceptible to manipulation, two classes of collaborative filtering algo- 
rithms which we refer to as linear and asymptotically linear are relatively robust. These results 
(/3 ' provide guidance for the design of future collaborative filtering systems. 

> ■ 1 Introduction 

' While the expanding universe of products available via Internet commerce provides consumers 

ff^ ■ with valuable options, sifting through the numerous alternatives to identify desirable choices can 

o . 

0^ . be challenging. Collaborative filtering (CF) systems aid this process by recommending to users 



products desired by similar individuals. 



^ ' At the heart of a CF system is an algorithm that predicts whether a given user will like various 

products based on his past behavior and that of other users. Nearest neighbor (NN) algorithms, for 



example, 
Youtube 



lave enjoyed wide use in commercial CF systems, including those of Amazon, Netflix, and 



18 



34( 1 . A prototypical NN algorithm stores each user's history, which may include, 
for instance, his product ratings and purchase decisions. To predict whether a particular user will 
like a particular product, the algorithm identifies a number of other users with similar histories. A 
prediction is then generated based on how these so-called neighbors have responded to the product. 
This prediction could be, for example, a weighted average of past ratings supplied by neighbors. 
Because purchase decisions are influenced by CF systems, they have become targets of manipu- 
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lation by unscrupulous vendors. For instance, a vendor can create multiple online identities and use 
each to rate his own product highly and competitors' products poorly. As an example, Amazon's 
CF system was manipulated so that users who viewed a spiritual guide written by a well-known 
Christian evangelist were subsequently recommended a sex manual for gay men S^] • Although this 
incident may not have been driven by commercial motives, it highlights the vulnerability of CF sys- 
tems. The research literature offers further empirical evidence that NN algorithms are susceptible 



to manipulation 
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351, 



In order to curb manipulation, one might consider authenticating each user by asking for, say, a 
credit card number to limit the number of fake identities. This may be effective in some situations. 
However, in web services that do not facilitate financial transactions, such as Youtube, requiring 
authentication would intrude privacy and drive users away. One might also consider using only 
customer purchase data, when they are available, as a basis for recommendations because they are 
likely generated by honest users. Recommendation quality may be improved, however, if higher- 
volume data such as page views are also properly utilized. 

In this paper, we seek to understand the extent to which manipulators can hurt the performance 
of CF systems and how CF algorithms should be designed to abate their influence. We find 
that, while NN algorithms can be quite sensitive to manipulation, CF algorithms that carry out 
predictions based on a particular class of probabilistic models are surprisingly robust. For reasons 
that we will explain in the paper, we will refer to algorithms of this kind as linear CF algorithms. 

We find that as a user rates an increasing number of products, the average accuracy of predic- 
tions made by a linear CF algorithm becomes insensitive to manipulated data. For instance, even if 
half of all ratings are provided by manipulators who try to promote half of the products, predictions 
for users with long histories will barely be distorted, on average. To provide some intuition for why 
our results should hold, we now offer an informal argument. A robust CF algorithm should learn 
from its mistakes. In particular, differences between its predictions and actual ratings should help 
improve predictions on future ratings. A linear CF algorithm generates predictions based on a 
probability distribution that is a convex combination of two distributions: one that it would learn 
given only data generated by honest users and one that it would learn given only manipulated data. 
As a user whose ratings we wish to predict provides more ratings, it becomes increasingly clear 
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which of these two distributions better represents his preferences. As a result, the weight placed 
on manipulated data diminishes and distortion vanishes. 

The main theoretical result of this paper formalizes the above argument. In particular, we will 
define a notion of distortion induced by manipulators and establish an upper bound on distortion, 
which takes a particularly simple form: 

distortion < — In — ^ — . 

n I — r 

Here r is the fraction of data that is generated by manipulators and n is the number of products 
that have already been rated by a user whose future ratings we wish to predict. The bound is 
very general. First, it applies to all linear CF algorithms. Second, it applies to all manipulation 
strategies even if manipulators coordinate their actions and produce data with knowledge of all data 
generated by honest users. The bound demonstrates that as the number of prior ratings n increases, 
distortion vanishes. It also identifies the number required to limit distortion to a certain level. This 
offers guidance for the design of a recommendation system: the system may, for example, assess 
and inform users about the confidence of each recommendation. The system may also require a 
new user to rate a set number of products before making recommendations to him. To put this in 
perspective, consider the following numerical example. Suppose a CF system that accepts binary 
ratings predicts future ratings correctly 80% of the time in the absence of manipulation. If 10% of 
all ratings are provided by manipulators, according to our bound, the system can maintain a 75% 
rate of correct predictions by requiring each new user to rate at least 21 products before receiving 
recommendations. 

To broaden the scope of our analysis, we will also study CF algorithms that behave like linear 
CF algorithms asymptotically as the size of the training set grows. This class of algorithms, which 
we refer to as asymptotically linear, is more flexible in accommodating modeling assumptions that 
may improve prediction accuracy. We will establish that a relaxed version of our distortion bound 
for linear CF algorithms applies to asymptotically linear CF algorithms. 

We will also show that our distortion bound does not generally hold for NN algorithms. In- 
tuitively, this is because prediction errors do not always improve the selection of neighbors. In 
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particular, as a user provides more ratings, manipulated data that contribute to inaccurate predic- 
tions of his future ratings may remain in the set of neighbors while data generated by honest users 
may be eliminated from it. As a result, distortion of predictions may not decrease. We will later 
provide an example to illustrate this. 

In addition to theoretical results, this paper provides an empirical analysis using a publicly 
available set of movie ratings generated by users of Netflix's recommendation system. We produce 
a distorted version of this data set by injecting manipulated ratings generated using a manipula- 
tion technique studied in prior literature. We then compare results from application of three CF 
algorithms: an NN algorithm, a linear CF algorithm called the kernel density estimation algorithm, 
and an asymptotically linear CF algorithm called the naive Bayes algorithm. Results demonstrate 
that while performance of the NN algorithm is highly susceptible to manipulation, those of kernel 
density estimation and naive Bayes algorithms are relatively robust. In particular, the latter two 
experience distortion lower than the theoretical bound we provide, whereas the distortion for the 
former exceeds it by far. 

One might also wonder whether manipulation robustness of a CF algorithm comes at the ex- 
pense of its prediction accuracy. As an example, consider an algorithm that fixes predictions for 
all ratings to be a constant, without regard to the training data. This algorithm is uninfluenced 
by manipulation but is likely to yield poor predictions, and is therefore not useful. In our experi- 
ments, the accuracy demonstrated by the three algorithms all seems reasonable. This suggests that 
accuracy of a CF algorithm may be achieved alongside robustness. 

Our theoretical and empirical results together suggest that commercial recommendation systems 
using NN algorithms can be made more robust by adopting approaches that we describe. Note that 
we are not proposing that real-world systems should implement the specific algorithms we present 
in this paper. Rather, our analysis highlights properties of CF algorithms that lead to robustness 
and practitioners may benefit from taking these properties into consideration when designing CF 
systems. 

This paper is organized as follows. In the next section, we discuss some related work. In Section 
[3l we formulate a simplified model that serves as a context for studying alternative CF algorithms. 
We then establish results concerning the manipulation robustness of NN, linear, and asymptotically 
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linear CF algorithms in Section [H In Section [5l we present our empirical study. We make some 
closing remarks in a final section. 



2 Related Work 



Early research on CF systems focused on their performance in the absence of manipulation 




a, 



most all work on manipulation robustness has been empirical. For 



401 ] present studies on product ratings made publicly available by 
Internet commerce sites. In each case, manipulated ratings were injected, and CF algorithms were 
tested on the altered data sets. The results point out that NN algorithms and their variants are 
susceptible to manipulation. This line of work identifies an effective manipulation scheme, which 
is to create multiple identities and with each identity, provide positive ratings on products to be 
promoted while rating other products in a manner indistinguishable from that of honest users. In 



401 ] ■ algorithms based on probabilistic latent semantic analysis and principal component 
analysis were tested. It turns out that these algorithms are asymptotically linear under certain 
assumptions about the data, and indeed, empirical results in these papers suggest that they are 
relatively robust to manipulation. These prior results support the conclusions of our work. 

To the best of our knowledge, the only prior theoretical work on manipulation robustness of CF 
algorithms is reported in i^31]. This work analyzed an NN algorithm that uses the majority rating 
among a set of neighbors as the prediction of a user's rating in an asymptotic regime of many users, 
each of whom rates all products. Manipulators rate as honest users would except on one fixed 
product. A bound is established on the algorithm's prediction error on this product's rating as a 
function of the percentage of ratings provided by manipulators. In our work, we do not require 
users to rate all products and do not constrain manipulators to any particular strategies. Further, 
we study the performance distortion on average, rather than for a single product. Finally, a primary 
contribution of our work is in establishing manipulation robustness of linear and asymptotically 
linear CF algorithms, which turn out to be superior to NN algorithms in this dimension. 

Several researchers have proposed alternative approaches to abating the influence of manipula- 



tors. In 



331 ]. a mechanism is proposed where users accumulate reputations while providing ratings 
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that are later validated by observed product quality, and a user's influence on ratings predictions 
is limited by his reputation. In this mechanism, a bound is established on the distortion induced 

291], researchers propose levera ging trust relation- 



by any finite number of manipulators. In 



21 



35 



39| suggest 



ships among users to weight recommendations and fend off manipulation, 
detecting manipulated ratings based on their patterns and discounting their impact. Our work 
complements this growing literature. First, additional sources of information can be integrated 
into the probabilistic framework that we introduce in this paper to further enhance manipulation 
robustness. Second, the analytical methods that we develop may be useful for studying the benefits 
of incorporating such information. 

Distortion due to manipulation may also be viewed as a loss of utility in a sequential decision 
problem induced by errors in initial beliefs. Our analysis is based on ideas similar to those that 
have been used to study the latter topic, which is discussed in [l3|. 



More broadly speaking, apart from collaborative filtering, there are other ways to aggregate 
users' response to products in order to provide recommendations. Research has been performed 
on the manipulation robustness of these systems as well. To get a flavor of this line of work, see 
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3 Model 

We now formulate a simplified model that will serve as a context for assessing performance of 
alternative CF algorithms. We will first define the product ratings that we work with and then 
introduce measures of distortion induced by manipulators. For the convenience of the reader, we 
summarize our mathematical notation in a table in Appendix [Bl 

3.1 Ratings Vectors 

In our model, a user selects ratings from a set S. To simplify our discussion, we let S* be a 
finite subset of [0, 1]. For example, S could be {0, 1} with representing a negative rating and 1 
representing a positive rating. Note that all the results in this paper can be easily generalized to 
accommodate any finite set S. There are N products, and a user's type is identified by a vector 
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in S . Each nth component of this vector reflects how the user would rate the nth product after 
inspecting it. 

The CF system has access to ratings provided by M identities who have rated products in 
the past. The data from each mth identity takes the form of a ratings vector w"^ G , where 
S = S U {1}. Here, an element of S represents a product rating whereas a question mark indicates 
that a product has not been rated. We refer to W = {w^, . . . ,w^) G 5^^^ as the training data. 
This data is used by a CF algorithm to predict future ratings. 

Consider a user who is distinct from identities that generated the training data and for whom 
we will generate recommendations. We will refer to such a user as an active user. We will think 
of a CF algorithm as providing a probability mass function (PMF) Pn,x,w over S for each triplet 

The PMF Pn,x,w represents beliefs about how an active user 
who has so far provided ratings x would rate product n after inspecting it. Such an algorithm can 
be used to guide recommendations; for example, the CF system might recommend to the active 
user the product he is most likely to rate highly among those that he has not already rated. 

3.2 Distortion Measures 

To study the influence of manipulation, we consider a situation where a fraction r of the identities 
are created by manipulators, while the remaining fraction 1 — r correspond to distinct honest users. 
We denote the honest ratings vectors by , . . . , g and the manipulated ratings vectors 

hy z\...,z'~^ e S^. Let y = . . . , y(i-'-)^) and Z = . . . , z''^) so that the training data is 

W = {Y,Z). 

To assess distortion of predictions made by a CF algorithm, we consider the following thought 
experiment. A hypothetical active user begins with a ratings vector x^, with each nth component 
set to x° = ?, and inspects products in an order u = (z^i, . . . , z/^v) G (Tn, where ajv denotes the 
set of permutations of {1, ... , N}. After inspecting product u^, the user rates it by sampling from 
the PMF P,yf_^x>'-'^,Y' updated ratings vector x^ is generated by incorporating this new rating 
in x*^"^. This stochastic process reflects how we would think honest users behave based on the CF 
algorithm and uncorrupted data set Y. We introduce the following measure of distortion, which 
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we refer to as Kullback-Leibler (KL) distortion: 

1 " 



n 

k=l 



where D denotes Kullback-Leibler divergence with the natural log. That is, for any two PMFs p 
and q over support U , D{p \ \q) = Y^ueU P^'^) ipiy) / li''^)) ■ This measure of the difference between 
PMFs is commonly used in information theory. 

For each A;, the PMF Pj^^^x*'-'^,Y represents the prediction that would be made in the absence of 
manipulators, whereas is what it becomes as a consequence of manipulation. Hence, 

D {pv^^x''-^,Y II P;/fe,a;'=-i,(Y,z)) Hieasures the extent to which the manipulated data Z influences the 
prediction. We take the expectation of this quantity, with distributed as the CF algorithm 
would have predicted if the data set were not corrupted by manipulated data. KL distortion 
d^{Pi ^1 Y, Z) averages these terms over the first n inspected products. 

Some algorithms such as NN algorithms generate predictions not in the form of PMFs, but 
as scalars that may be interpreted as the means of PMFs. For these algorithms, it may be more 
suitable to measure manipulation impact in terms of root-mean-squared (RMS) distortion: 



dn'^^'iP, '^,Y,Z)= - [x^^ ^k-i y - 

\ k=l ^ 

where 5^^^ ,j.fc-i y and ^;/j,,a;*^-i,(y,z) denote the scalar predictions of x^^. by the algorithm based on 
ratings history x'^~^ and data sets Y and {Y,Z), respectively. Note that if the algorithm generates 
PMFs as predictions, Xj^^^^k-i^y ^^d x^^ ,^k-i^^Y,z) would be expectations of Xj^^ taken with respect 
to Puf.^x'^-i^Y ^''^d respectively. The expectation in the definition of RMS distortion is 

taken with x^~^ distributed as the CF algorithm would have predicted based on Y. RMS distortion 
may offer a more transparent assessment than KL distortion because the former computes how much 
scalar predictions change in the same unit as the predictions themselves. RMS distortion is bounded 
by a function of KL distortion: 



dr%p,u,Y,Z) < \l^d^-{p,i.,Y,Z). 
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This is shown in Proposition [T] in Appendix lA.li 

To offer an intuitive interpretation for RMS distortion, we consider a setting where users provide 
binary ratings and the CF system offers binary predictions based on the PMFs that it generates. 
That is, we set S = {0, 1}. Given training data Y, for a user with ratings history x^~^, the system 
generates a prediction of x^^^^k-i^y — 1 fo'^ product Uk if Vuf,,x''-'^,YiX) ^ 1/2 and generates a 
prediction of x^^^^k-iy = otherwise. Similarly, we denote ij/^,x*-i,(y,z) as the binary prediction 
based on ( y, Z) . We define the following binary prediction distortion: 

1 

k=l 

where each x^^ is distributed according to p^^ ^k~i y and x^~^ is distributed as the CF algorithm 
would have predicted based on Y. This quantity captures the average decrease in the probability 
of correct predictions, induced by manipulation. It turns out that binary prediction distortion is 
bounded by RMS distortion: 

dl{p,u,Y,Z)<d^^^p,u,Y,Z). (1) 

Proved in Proposition [5] in Appendix lA.ll this result offers an interpretation of RMS distortion as 
an upper bound on the drop in the probability of correct predictions in a binary setting. 

One might wonder why we choose our particular distortion measures over other candidates. 
For instance, one option is to consider the top n most desirable products based on predictions, 
and define as distortion some measure of their quality change due to the manipulated samples. 
One reason why we prefer KL and RMS distortions is that they are convex functions of predictions 
while this measure is not. As such, this measure is difficult to analyze. Further, as will be discussed 
in Section 15.61 i^i a recent competition of CF algorithms, Netflix uses RMS error to assess their 
prediction accuracies [32:]. This suggests that commercial CF algorithms are typically designed to 
minimize convex measures of error. Our choice of distortion measures is in line with this approach. 

Another option one might consider is to measure the worst distortion over all products. This may 
be a reasonable choice if there is one manipulator interested in distorting ratings of one product. 
Since we model a situation where the CF system does not know the number or the objectives 
of manipulators, however, we would like to characterize the overall distortion experienced by all 
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products, and KL and RMS distortions capture that better than the worst distortion does. Note 
that one imphcation of our choice is that our robustness results will pertain to the overall distortion, 
rather than distortion on individual product ratings. As such, our algorithms will not provide 
guarantees on whether any particular individual product's ratings will be influenced signiflcantly 
by manipulators. 

4 Collaborative Filtering Algorithms 

In this section, we first introduce the notion of probabilistic CF algorithms. We then describe two 
classes of such algorithms, namely linear and asymptotically linear CF algorithms, and analyze their 
robustness to manipulation. Finally, we discuss nearest neighbor algorithms and their susceptibility 
to manipulation. 

4.1 Probabilistic Collaborative Filtering Algorithms 

A probabilistic CF algorithm carries out predictions based on a probabilistic model of how the 
training data is generated. We will model training data as being generated in the following way. 
First, user types uf" G -S^ are sampled i.i.d. from some PMF. Then, w'^ G is sampled from a 
conditional PMF, conditioned on w'^, which for each n assigns either wl^ =? or w^^ = w^. Note 
that this model allows for dependence between the type of a user and the products he chooses 
to rate. This accommodates, for example, systems in which users tend to inspect and rate only 
products that they care for. Given a PMF if) over x , we denote by ^ and -05 the marginal 
PMFs over and , respectively. 

We will call a CF algorithm p probabilistic if for each W there exists a PMF 'ipP'^ over x 
such that for each n and x, Pn,x,W is the marginal PMF of conditioned on x, with respect to the 
joint PMF ipP'^ . From here on, we will denote by tp^^ the marginal PMF over of the joint 
PMF '0^''^ corresponding to a probabilistic CF algorithm p and training set W. 
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4.2 Linear Collaborative Filtering Algorithms 

We say that a probabihstic CF algorithm p is linear if for any Wi G gNxMi ^ gNxM2^ 

Ml +M2 s Mi + M2 s ■ 

This definition states that the PMF -^^'(^i'^^) ^^i^^^^ ^ linear CF algorithm p generates based on 
training data (Wi,VF2) is a convex combination of two PMFs: namely, the PMF ijj^^^ that it 
generates based on Wi and the PMF ip^^^ that it generates based on W2- 

We now examine the KL distortion that manipulators can induce on a linear CF algorithm. 
Consider training data W = (Y, Z) consisting of ratings vectors Y from honest users and Z from 
manipulators, with the latter making up a fraction r of the training data. The following theorem, 
which is the main theoretical contribution of this paper, establishes a bound on the resulting KL 
distortion. 

Theorem 1. Fix the number of products N and let p he a linear CF algorithm. Then, for all M, 
r G {0,1/M,...,(M-1)/M}, y g ^A^xCi-O^^ Z G S^^''*^ andu^aN, 

dl-{p,u,Y,Z) < -In-^. 

n 1 — r 

This result is proved in Appendix IA.2[ 

Note that the bound only depends on the number of active user ratings n and the fraction 
of data r generated by manipulators. Hence, it represents a worst case bound over all linear 
CF algorithms p, the number of products A^, the quantity M and values (Y, Z) of the training 
data, and the order in which the active user rates products. This means, for example, that it 
applies even if manipulators coordinate with each other and select ratings with knowledge of the 
specific CF algorithm p, the honest ratings Y, and the ordering v. This also makes the bound 
relevant for realistic models of how a recommendation system might sequence products for a user; 
for example, each could be the product that the CF algorithm predicts as being most desirable 
among remaining ones after the user has inspected products 1^1, ... , I'k-i- 

Note that KL distortion vanishes as the number n of products rated by the active user increases. 
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To develop intuition for why this happens, we now offer an informal argument. Observe that 
^p,(^,^) — (^i _ r)ip?^^ + rip^^ . If ^^'■^ is identical to ip^^ , then -i/;^'^^''^^ is equal to V'^^ and 
distortion will be zero. Otherwise, if "0^'^ and ^^'^ are different, as an active user inspects and 
rates products in the manner that we define, his ratings will tend to be distinguished as sampled 
from ip^^ rather than from ip^^ ■ As such, the influence of Z on predictions diminishes as n grows. 

The bound depends on r through the term ln(l/(l — r)). This term captures the dependence of 
KL distortion on the fraction of data produced by manipulators. As one would expect, this term 
vanishes when r is set to zero. 

As a corollary of Theorem [1] and Proposition [1] we have the following bound on RMS distortion. 

Corollary 1. Fix the number of products N and let p be a linear CF algorithm. Then, for all M, 
r e {0,l/M,...,{M -l)/M}, y g ^A'xCi-OA/^ Z G S^^''*^ andueaN, 

Figure [J illustrates how this bound depends on r and n. The bound can offer useful guidance. 
For example, it ensures that if an active user has rated 22 products and no more than 10% of the 
training data is manipulated, then the RMS distortion induced by manipulators is less than 0.05. In 
a setting where users provide binary ratings and the system generates binary predictions, according 
to our bound on binary prediction distortion in ([1]) in Section [3.21 the average probability of correct 
predictions decreases by at most 0.05. Hence, if a binary CF system predicts ratings correctly 80% 
of the time in the absence of manipulation, it can maintain this probability at 75% in the presence 
of manipulation if it requires active users to rate 21 products before receiving recommendations. 

We will introduce examples of linear CF algorithms in Section 15. 3[ 

4.3 Asymptotically Linear Collaborative Filtering Algorithms 

We say that a probabilistic CF algorithm p is asymptotically linear if for all PMFs ip, (j) over 
5^ X 5^, r G [0,1], and e > 0, 

Jm^Pr [d (((1 - r)^!'^" + rV^f^'") || V^J^^'"'^'"^) > e) = 0, 
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1 5 10 15 20 25 30 35 40 ri 

Figure 1: Bound on v, Y, Z) as a function of n. The four curves from bottom to top are for 

cases where r = 0.01, 0.05, 0.1, and 0.2, respectively. 

where for each m, Um = (u^ , ■ ■ ■ , vJ^~'') G S'^xC'"-') and Vm = {v^ , ■ ■ ■ ,v^) ^ S^^\ I ~ Binomial(m, r), 
and v} , . . . , u"^~^ ~ Tps and v^, . . . ^ i's are i.i.d. sequences. 

To understand the preceding definition, we think of training data (C/mi Kn.) for each m as 
generated in the following way: with probability 1 — r, a ratings vector is sampled from tps, which 
we denote as Uj for an appropriate i, and with probability r, a ratings vector is sampled from (j)s, 
which we denote as Vi for an appropriate i. As m grows, an asymptotically linear CF algorithm 
behaves like a linear CF algorithm in that the PMF that it generates based on data 

iUmi y-m) converges in probability to a convex combination of two PMFs: namely, the PMF tp!^^"' 
that it would generate based on the -05— sampled set Um and the PMF that it would generate 

based on the (/>^— sampled set Vm- By an application of the weak law of large numbers, it can be 
shown that all linear CF algorithms are asymptotically linear. 

We can also show that asymptotically linear CF algorithms are asymptotically robust, in a 
sense to be made precise later. It turns out that this result applies to a broader range of practical 
algorithms that are asymptotically linear in a more restricted sense, which we now define. Consider 
a set of joint PMFs over 5 X 5^. We say that a probabihstic CF algorithm p is asymptotically 
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linear with respect to ^ if for all PMFs ip, (j) £ ^ , r £ [0, 1], and e > 0, 



lim Pt(d(((1- r)^5^'" + rV'^'^'") || ^^^^'^^A > e] = 0, 

where for each m, Um = {u^, ■ ■ ■ , ■u™'"') G gNx(m~i) _ (.yi^ . . . ^ g S^^\ I ~ Binomial(m, r), 

and n^, . . . , li"*"' ~ -05 and v^, . . . , ~ 05 are i.i.d. sequences. 

The following theorem and corollary characterize the robustness of asymptotically linear CF 
algorithms. 

Theorem 2. Fix the number of products N and a set ^ of joint PMFs over x . Let p he a 
CF algorithm asymptotically linear with respect to 4'. Then, for all /x*,7r* G r G [0, 1), v G a^, 
n G Zj^, and e > 0, 

lim Pr (d^'^ip, I/, Zra) > - In + e ) = 0, 

m->oo y n 1 — r J 

where, for eachm, = (y\ . . . G ^^xC™"')^ Zm = {z^,...,z^) G S^^\ / ~ Binomial{m,r), 

and y^, . . . , y™-"' ~ and z^, . . . , ~ vrj are iid sequences. 

Corollary 2. Fix i/ie number of products N and a set ^ of joint PMFs over x 5^. Let p be a 
CF algorithm asymptotically linear with respect to Then, for all /x*,7r* G r G [0, 1), G aN, 
n G and e > 0, 



lim Pr ( d^'ip, ly, Ym, Z^) > a/t^ In + e ) = 0, 

- -- ' In 1 — r I 

where, for each m, = {y^, . . . , y"^"') g S^^^"^~''\ Z^ = {z^ , ■ ■ ■ , z'-) G S^^'' , I Binomial{m, r), 
and y^, . . . , ~ and z^, . . . , ~ vrj are i.i.d. sequences. 

Theorem [2] is proved in Appendix I A. 3 1 and Corollary [2] follows from Theorem [2] and Proposition [TJ 
These results state that for any CF algorithm p asymptotically linear with respect to ^ and any 
fixed PMFs vrj G as honest users and manipulators sample more data from them, with high 
probability, the distortion bounds for linear CF algorithms in Theorem [1] and Corollary [1] will also 
apply to p and in particular, distortion will vanish as n grows. 

The intuition behind these results is similar to that for Theorem [TJ In particular, given sufficient 
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data, the learned PMF should closely approximate (1 — r)/ii + rvri-. If /x^ and vr-i- are 

ij J J ij J 

similar, then should be close to fj^ and distortion should be close to zero. On the other 

hand, if fi^ are tt^ are significantly different, as an active user provides more ratings, it will be 
increasingly clear that they are sampled from /x^ rather than vr^, and distortion will diminish. 

We now study a class of asymptotically linear CF algorithms, which converge to the true PMF 
of user types under certain assumptions about the training data. For starters, we say a set ^ of 
PMFs over x S'^ is identifiable if all distinct PMFs ■i/') G ^ have distinct ratings marginals ips 
and (j)s- Given an identifiable set ^, we say that a probabilistic CF algorithm p is consistent with 
respect to ^ if for all "0 € ^' and e > 0, 

lim Pr (d ( II V'^) > = 0, 

where for each m, Wm = {wi, . . . ,Wm) £ ig generated independently and in particular, 

wi, . . . ,Wm ~ i^s is an i.i.d. sequence. This definition is meant to capture algorithms that con- 
verge to ips and recover its unique corresponding joint PMF ip £ ^ and type marginal as the 
■05'— sampled training data grows. The following theorem states the setting in which a consistent 
CF algorithm is asymptotically linear. 

Theorem 3. Any probabilistic CF algorithm consistent with respect to an identifiable and convex 
set ^ is asymptotically linear with respect to ^. 

The preceding result, proved in Appendix lA.31 together with the definition of consistent algorithms 
and Theorem [2] imply that if data are sampled i.i.d. from some PMF ^p in an identifiable and 
convex set ^, then a consistent algorithm with respect to ^ would provide guarantees on both 
prediction accuracy and robustness to manipulation as training data grows. In practice, even if it 
is unclear whether the identifiability and convexity conditions hold, as a starting point, one might 
still apply a consistent CF algorithm, with the hope that it will deliver reasonable accuracy and 
robustness. In Section \5\ we will empirically evaluate a consistent CF algorithm called the naive 
Bayes algorithm. 
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4.4 Nearest Neighbor Algorithms 



0, Q, 



34l |. generally come 



Nearest neighbor algorithms, widely used in commercial CF systems 
in two classes. The first class predicts a user's ratings based on those provided by similar users, 
referred to as neighbors. The second class makes predictions on a product based on ratings that 
the user has provided on similar products, which can also be viewed as neighbors. In this section, 
we study a simple NN algorithm of the first class and the extent to which its predictions can be 
distorted by manipulators. We show that the bounds of the previous section do not apply to this 
NN algorithm, and unlike the case of linear CF algorithms, distortion does not generally diminish 
as the active user inspects and rates products. Though our analysis focuses on a particular NN 
algorithm, the resulting insights apply more broadly and in particular, to NN algorithms of the 
second class as well. 

We study the case of binary ratings. NN algorithms identify and weight neighbors using a 
similarity measure. We will consider a similarity measure that increases by one for each pair of 
consistent ratings and decreases by one for each pair of inconsistent ratings: 

six,y) = |{1 < n < TV : x„ = y„ ^ ?}| - |{1 < n < iV : ? / x„ / y„ ^ ?}|, 

for any pair of ratings vectors x,y £ . 

We consider an NN algorithm that predicts the future rating of product n for a user with 
ratings vector x by carrying out the following steps. First, the algorithm identifies the subset of the 
training data samples that offer ratings for product n. If this subset is empty, the NN algorithm 
optimistically predicts a rating of 1. Otherwise, from among these ratings vectors, the ones most 
similar to x are identified. We denote the resulting set of neighbors, which should be a singleton 
unless there is a tie, by M{n,x,W). Finally, an average of their ratings for product n forms the 
prediction: 

Xn,x,W — TTTT rTFTi ■ 

\N{n,x,W)\ 

Our observations extend to other more complicated similarity metrics and neighbors selection meth- 
ods. However, we focus on this particular case in order to keep our analysis clean. 
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We now consider a simple setting that facilitates analysis of RMS distortion in our NN algorithm. 
We are interested in how RMS distortion changes as the number of ratings n provided by an active 
user grows. Since n cannot exceed the number of products N, we will define an ensemble of models 
indexed by A^. To facilitate our construction, we will only consider even N. 

To keep things simple, we restrict attention to a situation where honest users agree on the 
ratings of all products. In particular, there is a single user type x°'^'^ which rates odd-indexed 
products 1 and even-indexed products 0. The user type PMF assigns all probability to this 
vector. Each honest ratings vector y"^ is generated by sampling a random set of odd numbers 
between 1 and A'^ — 1, then for each sample k, replacing components k and + 1 of x"^*^ with 
question marks. We assume that the honest ratings Y of training data is such that each set of odd 
numbers between 1 and N — 1 is sampled exactly once. That is, each element of Y corresponds to 
an element of the set {(1, 0), (?, ?)}^/^. As such, there are 2^/^ honest ratings vectors. 

Recalling the setting that wc use for assessing distortion, we now consider an active user who 
inspects products in the ordering u = (1, . . . , AT), rating each based on the prediction of the NN 
algorithm. It is easy to see that when there are no manipulators, the NN algorithm perfectly 
predicts ^k-i y = x^'^'^, and therefore, after the user inspects k products, his ratings history 
has = T^'^'^ for j < k and Xj = ? for j > k. 

We assume that manipulators produce one half of the training data. For each honest ratings 
vector y"*, manipulators produce a ratings vector z"^ which agrees with y"* on all products rated 
by y'^. However, question marks in y"* are replaced by 1 for even indices and for odd indices. 
That is, each Zm corresponds to an element of the set {(1,0), (0, 1)}^/^. 

Suppose k is even. Given x^, the NN algorithm predicts what the active user's rating will be for 
product k + 1. To do this, it identifies neighbors Af{k + 1, x*^, (F, Z)), which includes the following 
subsets of the training data: 

• A set Yi which consists of honest ratings vectors y"^ where yj* / ? for j < /c + 1. 

• A set Zi that, for each G Yi, includes the corresponding manipulated vector G Z. 

• A set Z2 which consists of each manipulated ratings vector z'^ such that the corresponding 
honest ratings vector y"* has yj^^'? for j < k and y^j^^i = ?• 
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Note that each of these sets is of cardinahty 2^^"*^)/^"^. Vectors in Yi and Zi correctly rate 
product A; + 1 as 1, whereas vectors in Z2 incorrectly rate it as 0. As a consequence, the prediction 
for product /c + 1 is 5A:+i,2:'=,(y,z) = 2/3 and the resulting squared error is 



^k+l ~ ^k+l,x,(Y,Z) 



2 _ 1 

~ 9' 



The preceding argument applies for all even k. For odd it is easy to show that the NN 
algorithm correctly predicts ^f^^ = 0. It follows that the RMS distortion for even n is 



<«'(fi,f,r,z) 



1 " 

k=l 



^k ~ ^fc,a;'=-l,(Y,Z) 



1 



3^/2' 



The preceding example shows that the RMS distortion of an NN algorithm for r = 1/2 does 
not decrease as n grows. This happens because manipulated data are strategically generated to 
be sufficiently similar to honest data so that no matter how many ratings an active user provides, 
manipulated ratings vectors will make up a fixed fraction of the neighbors and consequently induce 
a significant amount of distortion. 

In contrast, Corollary [T] establishes that linear CF algorithms exhibit a more graceful behavior, 
with RMS distortion vanishing as n increases. This is not to say it is impossible to design an NN 
algorithm that exhibits a more desirable behavior when applied to our example. However, it is 
difficult to know for sure whether a given variation will behave gracefully in all relevant situations. 

4.5 Discussion 

We now provide an intuitive explanation for why linear CF algorithms should be robust to ma- 
nipulation relative to NN algorithms. First note that robustness depends on how a CF algorithm 
learns from its mistakes. In particular, a robust algorithm should notice as it observes differences 
between its predictions and an active user's ratings that certain things learned from the data set 
are hurting rather than improving its predictions. 

Recall that a linear CF algorithm p generates based on the training set {Y, Z) a PMF -0^'^'^''^^ 
that is a convex combination of V*^'^ and ^^'^ > which are PMFs that the algorithm would generate 
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based on Y and Z, respectively. As an active user rates more products, it will be increasingly clear 
by probabilistic inference that his ratings x are sampled from ipf^ . In effect, inaccurate predictions 
induced by Z will increase the weight on -^j^^ in the conditional PMF of ratings ip^^''^'^\-\x) 
conditioned on observed ratings x. And this makes future predictions more accurate. 

In an NN algorithm, on the other hand, inaccurate predictions do not generally improve further 
predictions. In particular, manipulated ratings vectors that contribute to inaccuracies may remain 
in the set of neighbors while honest ratings vectors may be eliminated from it. In the example in 
Section for instance, manipulated data are generated so that no matter how long an active user's 
ratings history is, each honest ratings vector selected as a neighbor has a manipulated counterpart 
that is as similar, and hence also selected as a neighbor. Consequently, as an active user provides 
more ratings, the numbers of honest and manipulated neighbors both decrease and stay equal. As 
a result, inaccurate predictions do not decrease future distortion. 

5 Empirical Study 

In this section, we present our empirical findings on the manipulation robustness of NN, linear, 
and asymptotically linear CF algorithms. We first introduce the data set that we worked with and 
then describe the methods we used to evaluate robustness. 

5.1 Data Set 

We obtained a set of movie ratings provided by users, made publicly available by Netflix's rec- 
ommendation system. Each rating is an integer between 1 and 5, which we normalized to be in 
{0,0.25,0.5,0.75,1} so that the analysis and results in our paper apply directly. We randomly 
sampled from the data set 5000 users, who have provided 200000 ratings of 500 movies. We then 
randomly chose 4000 of these users and for the purpose of our experiments, treated them as honest 
users and their ratings as a training set Y. We used the ratings of the other 1000 users as a test set, 
which we refer to as X. We then generated three separate sets of 444, 1714, and 4000 manipulated 
ratings vectors, respectively. Each set, which we refer to as Z for simplicity of discussion, is gener- 
ated to promote 50% of the movies by using a technique reported to be effective in the literature 
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[s, ll8|, 12^, 124, l26|, l3l|, l35|, |40(]. Specifically, we randomly sampled 250 of the 500 movies in Y and let 
each manipulated ratings vector in each Z assign the highest ratings to these movies, and assign a 
random rating to each of the other movies, sampled from the movie's empirical marginal PMF of 
ratings in Y. We then replaced a random subset of ratings in Z with question marks so that its 
fraction of question marks matches that in Y. Manipulated ratings vectors generated this way are 
meant to be similar to honest ratings vectors except on movies to promote. 

5.2 Evaluation Methods 

To test the robustness of each CF algorithm p, we treated ratings in X as ratings that an active 
user would provide and let p predict them. Specifically, we fixed n and for each ratings vector 
X £ X, identified n random products that it has assigned ratings to and randomly permuted them 
to form an ordering = {vf, . . . ^v^)- each k < n, let x^~^ be a ratings vector that agrees 
with X on products lyf,..., and assigns question marks to the other products. An algorithm 
p is then used to generate a scalar prediction x^^x^^k-iy for the rating of product based on x^~^ 
and the honest data set Y . Similarly, a prediction based on a training set iY, Z) corrupted by 
manipulated ratings is denoted by 5:^x_^k-\ (^^z)■ assess influence due to manipulation, for each 
n, we computed the following quantity, which we will refer to as empirical RMS distortion: 



<ir'(P,f-^,x,>',z) 



' ' x&X k=l 



Here, v-^ = {(z^f , . . . , f^) : x € X}. The empirical RMS distortion measures changes of predictions 
for products rated by active users. It is similar to the RMS distortion Y, Z) that we defined 

earlier, with one difference: whereas i^, Y, Z) samples each x^^. from the PMF Piy^^x^'-^x that 

the algorithm generates based on Y, ,X, Y, Z) uses elements of X as samples. We used 

empirical RMS distortion rather than RMS distortion to assess algorithms in our empirical study 
because computing RMS distortion would take too long, requiring a running time exponential in 
the number of products n rated by an active user. Further, if a CF algorithm generates a nearly 
correct distribution in the absence of manipulation, its empirical RMS distortion will be close to 
its RMS distortion. 
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One might also wonder whether high robustness of CF algorithms stems from high prediction 
accuracy or comes at the expense of it. To better understand the relationship between these two 
performance measures, we also computed the following RMS error for each CF algorithm, which 
we will refer to as empirical RMS prediction error. 



This quantity computes the RMS error of predictions for ratings in X when the algorithm uses Y 
as training data. 

For algorithms that we tested, we tuned some of their parameters by cross validation. This is a 
technique that selects parameter values based on the performance of the corresponding algorithm 
on out-of-sample data in order to estimate their performance on future data. Specifically, we 
randomly sampled 20% of the users in Y. We treated their ratings as a validation set V and 
generated predictions based on the remaining ratings Y\V. Consider a parameter 7 that we tuned 
for an algorithm. For each value 7' in a range F, we set 7 = 7', used the corresponding algorithm 
to predict ratings in V based on ratings in Y\V, and computed the empirical RMS prediction 
error 

^RMs (p^i,v^v,Y\V). Finally, we selected a parameter value 7* that results in a minimal 
error. Similarly, when using (Y, Z) as the training set, we sampled the validation set V from {Y, Z) 
and for each 7' G F, computed the empirical RMS prediction error 6^^^{p,i'^ ,V, {Y, Z)\V) and 
selected a 7*. Note that we chose to optimize for prediction accuracy rather than robustness in 
cross validation because we wanted the algorithms to maintain reasonable accuracy and wanted to 
avoid tuning them to be robust for specific manipulation techniques. 

Overall, for each algorithm p, we generated multiple samples of X, Y, Z, and ly'^ , and av- 
eraged their resultant d^^^{p,i'-^ , X,Y, Z) and ^^"^(p, i/^, X, y) across samples to obtain reli- 
able estimates. To summarize with our notation, S = {0,0.25,0.5,0.75,1}, N = 500, {M,r) G 
{(4444, 0.1), (5714, 0.3), (8000, 0.5)}, and 1 < n < 40. We tested three CF algorithms: a linear 
CF algorithm called kernel density estimation, an asymptotically linear CF algorithm called naive 
Bayes, and an NN algorithm called k nearest neighbor. We now present them in detail. 



'RMS 

'n 
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5.3 Kernel Density Estimation Algorithms 

Kernel density estimation (KDE) algorithms smooth the training data and use their resultant 
distribution to predict future ratings. For an in-depth treatment of KDE algorithms, see [l^. In our 
context, we say that a probabilistic CF algorithm p is a KDE algorithm with kernels {/C^ : w G S"^} 
if for any W G 5^^*^ 

— N 

where each /C^ is a PMF over S parameterized by a ratings vector w. It turns out that any 
KDE algorithm is a linear CF algorithm and any linear CF algorithm is a KDE algorithm. We will 
establish this in Proposition [3] in Appendix lA. 41 

In our experiments, we considered a KDE algorithm with kernels {/C^} such that for each type 

N 



n=l 

where for each s £ S , ks is a PMF over S defined as follows. For s 7^?, kg is the unique PMF that 
satisfies ks{s)/ks{s) = exp(— |s — for all s £ S. For s =?, ks{s) = l/\S\ for all s £ S. That 

is, kg assigns the highest probability to s and exponentially lower probabilities to values different 
from s if s 7^?, and assigns uniform probability to all values if s =?. It is easy to see that each /C^ 
thus defined is a PMF, and it assigns high probability to types similar to w and low probability to 
others. The constant (3 > tunes the shape of kg, which we set to be 0.15 in our experiments. 

To predict the rating of product m„ for a user with past ratings x^~^, our KDE algorithm 
generates a PMF Pu„,x"-''^,w^ which is the conditional PMF oix^^ conditioned on x"^^ with respect 
to the joint PMF '4'^^ , given by 

J^weW Y[k = l ^W^k (^"fc ^)^'^'^n (^) 



Pvr.,x"-\W{S) = # {Xu„ = S\X 



p,W (- _-| n-l\ _ '^weW llk=l"'w^k'^''k 



k 



[X, 



for each s £ S. The corresponding scalar prediction is the expectation taken with respect to 
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5.4 Naive Bayes Algorithm 

A naive Bayes (NB) algorithm assumes that the true distribution of data is a convex combination 
of distinct distributions in each of which features of the data are conditionally independent. It 
aims to learn from training data the weights of the combination and feature marginals within each 
distri 



see 



bution. For a formal analysis of the algorithm and its applications to other problem settings, 

0,0 

■ IIGI], We now describe a particular version of the algorithm that we used and discuss the 



context in which it is consistent and asymptotically linear. 



Our NB algorithm assumes that data are sampled from a joint PMF <f) over S X such that 
for each {w, w) G x with w ^ w, 



N 



(w, w 



,IHI-.(i_,)A^-IHI. ^^^Jj0(-)1 (2) 



J=l n=l 



where q S [0, 1), L £ Z^, 7] £ Tl, and 6i^n G for all I, n. Here, we write w ^ w ioi {w, w) if for 
each n, either Wn = Wn or Wn =?• We let HifH? denote \{n : Wn =?}|. For any A;, we define simplex 
Tfc = {(ii,... : > 0,V1 < j < k.Y!}=itj = !}• We also let 9 = {Oi^nA < I < L,l < n < N}. 

To understand (j), let us consider the following generative process of ratings vectors. Let $ = 
{</>i, . . . , c/)^} be L PMFs over 5^ where each satisfies 

AT 

(«Jn) 



= n Gi 



n=l 

— _/V _ — 

for sdl w £ S . That is, each Wn is independently distributed and is equal to s £ S with probability 
9) A type w is generated by first selecting a PMF from $ where each (pL is chosen with probability 
rji and then sampling from that PMF. A ratings vector w is then generated by randomly replacing 
each rating Wn by a question mark with probability q, independent of the value Wn and whether 
other ratings are replaced by question marks. 

The algorithm also assumes a geometric prior for L and Dirichlet priors for r/, 9, and q. Hence, 
the posterior probability density function (PDF) of (L, rj, 9, q) conditioned on training data W is 
given by 

f{L, r,, 9, q\W) = c/(L, ry, 9, q) Vt{W\L, r,, 9, q), (3) 
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where c is a normalizing constant and prior PDF 



fiL,rj,9,q)=pliL)f^{i])llfgi9i,n)f,{q), with 



Ln 



pl{L) = CLe-^\ 

L 



1=1 

feiOi,n) = C0 O^^l, yi,n, and 



1=1 

s&S 

fqil) = Cqq{l - q), 



and data hkehhood 



Pv{W\L,r,,e,q)= U (^M'-'a - g)^-^-^ E 11 C 

w€W \ y=l n:wn^? 

Here, subscripts L, r], 9, and q of the functions p^, fg, and /g denote the parameters that the 
distributions are over. The superscript r in p£ denotes dependence on parameter r, which controls 
the shape of the geometric PMF. The superscript L in denotes dependence on L. cl, c^, cg, 
and Cq are normalizing constants. 

The algorithm maximizes the posterior PDF over parameters by using the expectation-maximization 
algorithm [9(] and obtains 

{L,fi,9,q) G argmax/(L,?7,6',g|W^). 

We denote by the PMF over S implied by (L, f/, 9), which the algorithm uses for predictions. 

In particular, a prediction for the rating of product f„ for a user with past ratings x"~^ is given 
by the PMF 

EL " Tjn—1 /j(^"fc 
,_ _ _. l = l'ni llk = l"l,Uk 



YX=ihWu-. 



1 l,!/). 
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for each s £ S. The corresponding scalar prediction is 

s&S 

In our experiments, we tuned bandwidth r by cross vahdation over the range T = {1,10, 
100, 1000, 10000, 100000} and settled at r = 10000. 

We now discuss the context in which the NB algorithm is consistent and asymptotically linear. 
For any a G [0, 1), We let be the set of all joint PMFs over 5^ x 5^ of the form in ^ where 
q is fixed to be a. We establish in Proposition [4] in Appendix IA.4I that for any a, is identifiable 
and convex, and the NB algorithm is consistent with respect to it. Then, by Theorems [5] and O 
the algorithm is asymptotically linear with respect to and our distortion bounds apply. Let 

= : ip S ^°'} be the set of marginals over of PMFs in We note that for any a, 

— N — N 

contains all PMFs over S . This implies that for any PMFs /x^ and vr-^ over S , if honest and 
manipulated data are generated by first sampling types from these PMFs and then independently 
replacing each rating by a question mark with the same probability a, then the NB algorithm will 
be asymptotically robust as the sample size grows. In our experiments, although the condition 
regarding replacement by question marks may not hold, we still apply the NB algorithm with the 
hope that it will deliver reasonable robustness. 

5.5 k Nearest Neighbor Algorithm 

A class of NN algorithms called k nearest neighbor (/cNN) algorithms is frequently used as a 



performance benchmark in prior work 



18|, 



40l |. The version that we tested works as follows. 
To predict the rating of product Un by a user with past ratings x""^ where n > 3, the algorithm 
identifies a set of neighbors AA(f„, x""-*^, VF) to be k ratings vectors w £ W such that w^^ and 
score highest with on the following similarity measure: 

s[w,x ) — 



^l<i<N:Wij^?i'^i ~ ■^)^V Sl<i< 



x'ir^-x^-^)^ 
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where average ratings are given by 

\{l<i<N:w, /?}|' 

,„ ^ l<i<n-l -^Ui 
X = ; . 

n — 1 

Note that ,s here resembles the notion of a sample correlation coefficient. Its numerator is the 
covariance between non-question-mark components of w and x'^~^. The denominator is the product 
of the standard deviation of non-question-mark components of w and the same quantity for 
The algorithm then generates the following scalar prediction: 

Xur.,x-\W = mm <^ max <^ s„,„, x + = ^ ^ , 

where s^^^ = maxjs : s E S} and s„i„ = min{s : s G 5}. To arrive at this quantity, for each 
neighbor, the difference between its rating for product Vn a^nd its average rating w is first 
computed. A weighted sum of these differences is then computed, where the weights are normalized 

similarity measures. The user's historical ratings average x"'^^ is then added to the sum. The total 
is used as the prediction, unless it falls outside [s„i„,s„^^], in which case either s„i„ or s„^^ is used, 
whichever is closer. 

For a user with ratings history where n < 2, s{-,x"'''^) is not well-defined. In this case, 
the algorithm uses the average rating of product Vn in the training data to generate the prediction: 

. j_ j_ ^{w■.weW,w,.„^^7}'"^'^r^ 1 [ 

Xun,x^-^,W = mm < s»ax, max <^ jj-^ " ' } } . 

[ [ \{w -w eW,Wu^^7}\ j j 

In our experiments, we tuned the number of neighbors k by cross validation over the range 
r = {1, 2, . . . , 40} and settled at A; = 10. 

Note that even though the A;NN algorithm generates scalar predictions, it still fits our definition 
of CF algorithms because it is possible to come up with PMFs whose corresponding expectations 
equal the predictions x^^^^n-i^iy. We do not explicitly define such a PMF, however, because it is 
not necessary for computing the empirical RMS distortion in our experiments. 
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5.6 Results 

Figure [2] shows the empirical RMS distortions for the three algorithms that we tested, with different 
fractions of manipulated data. Our results suggest that in practice, NB and KDE algorithms are 
significantly more robust than A;NN. In particular, when a user's ratings history is short, /cNN and 
NB both incur higher empirical RMS distortions than KDE. This difference arises because while 
/cNN and NB ignore question marks, KDE uses them and as a result, tempers its predictions. To 
gain some intuition, let us consider the following problem instance where ratings are binary: the 
set of honest ratings Y consists of K vectors whose entries are all Is and as many vectors whose 
entries are all question marks. The set of manipulated ratings Z consists of K vectors whose entries 
are all Os and as many vectors whose entries are all question marks. To predict the first rating x^-^ 
of an active user, /cNN would yield a prediction of 1 and 1/2 based on Y and (Y,Z), respectively, 
incurring an RMS distortion of 1/2. KDE would yield a prediction close to 3/4 based on Y and a 
prediction of 1/2 based on (Y, Z), incurring an RMS distortion of 1/4, significantly less than that of 
A;NN. Clearly, the presence of question marks smooths KDE's predictions and keeps its distortion 
low. 

In Figure [21 as more ratings are provided, distortions incurred by all three algorithms decrease. 
When a user's ratings history is long, NB and KDE incur distortions significantly lower than that 
of A;NN. Note that distortions of NB and KDE always stay below the bound in Corollary [TJ The 
curves for A;NN are flat for n <2 because the algorithm provides the same predictions for the first 
two ratings Xjy^ and x^^ of an active user. Note that as fraction of manipulated data r increases, 
distortions incurred by all three algorithms increase as well. 

Figure [3] displays the empirical RMS prediction errors of the three algorithms. When n is large, 
their errors all decrease and in particular, NB offers the lowest error and A;NN, the highest. A:NN 
sees a spike around n = 3 because the algorithm switches its prediction method there: it generates 
predictions by using average ratings of all users for n < 2 and generates predictions by using average 
ratings of neighbors for n > 3. 

To get a better sense of our results, we note that Netflix announced that its proprietary algorithm 
achieves an empirical RMS prediction error, normalized to our scale, of 0.238 on a large test set, 
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Figure 2: Empirical RMS distortion as a function of n, for different r. 
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Figure 3: Empirical RMS prediction error as a function of n. 



and will award one million dollars to anyone that improves it to 0.214 [32]. One might wonder why 
a decrease of 0.024 may have such a large impact on recommendation quality. We suspect that 
due to the large number of movies, many of them are given similar predicted ratings. As a result, 
a small improvement in prediction accuracy may tease apart these movies and identify the most 
desirable ones. 

Compared to Netflix's benchmark and target prediction errors, our results are reasonable but 
not competitive. This is because we did not focus on optimizing the prediction accuracy of the 
algorithms. If our objective was to achieve the highest possible accuracy while maintaining reason- 
able robustness, one option we could try is to fine-tune our robust algorithms to be accurate. For 
example, for KDE algorithms, we could work to identify more effective kernels. For NB algorithms, 
we could choose different priors or use methods other than expectation-maximization to find the 
model parameters. We could probably also design other robust linear and asymptotically linear CF 
algorithms that achieve higher accuracy as well. Overall, we are not suggesting that in practice, the 
specific algorithms that we presented should be directly implemented. Instead, one should either 
use them as starting points or take the insights that they yield into consideration when designing 
accurate and robust CF systems. 
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6 Conclusion 



Our analytical and empirical work suggests that linear and asymptotically linear algorithms can be 
more robust to manipulation than commonly used nearest neighbor algorithms. Our results also 
suggest that it is possible to design algorithms that achieve accuracy alongside robustness. As such, 
recommendation systems of Internet commerce sites may improve their robustness to manipulation 
by adopting the approaches that we describe. They may also use the bounds on distortion that we 
establish as a guide on how many ratings each user should provide to a recommendation system 
before its predictions can be trusted. 

The simple setting in our work serves as a context for the initial development of our idea, and 
can be extended in multiple ways. One direction is to study the robustness of collaborative filtering 
algorithms as measured by alternative metrics. One metric could be, for instance, a user's utility 
loss due to manipulation. Another extension is to design algorithms that provide non-asymptotic 
guarantees on both prediction accuracy and robustness. 

The framework that we establish also facilitates studying the effectiveness of alternative tech- 
niques to abate influence by manipulators. For instance, given a scheme that incentivizes users to 
inspect and rate products, one could analyze how honest users and manipulators would behave, 
and then use our distortion metrics to assess the robustness of the scheme to manipulation. 

It is also worth mentioning that many commercial recommendation systems build on multiple 
sources of information, not just collaborative filtering For example, as discussed in [0], rec- 
ommendations should also be guided by features of the products being recommended. An added 
benefit of the approaches that we present is that they facilitate coherent fusion of multiple sources 
of information. 
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A Proofs 

A.l Relationships Among Distortion Measures 

Propositions [1] and [2] state relationships between KL, RMS, and binary prediction distortions. 
Lemmas [T] and [2] help prove them. 

Lemma 1. Consider two PMFs p and q with support on the same finite set U C [0,1]. Let u 
denote a dummy variable. It holds that 



E[u] - E[u] 
p 1 



1„ 



Proof. Let U = {ui, . . . ,uiy} and correspondingly, let pi = p{ui) and qi = q{ui), ior 1 < i < N. 
Without loss of generality, let pi — qi > P2 — Q2 ^ " " " ^ Pn — Qn- There exists n such that 
Pn-qn>0> Pn+i " ^n+i- Heucc, X]"=i \Pi - <li\ = \Pi - We then have 

N n N 

^Ui{Pi-Qi) = ^Ui{pi-qi)-^ ^ Ui{pi-qi 

i=l i=n-\-l 
N -\ ( n 

UiiPi-Qi) > < max ly^jpi - qi\ , ^ \Pi - Qi 



EN 


-E[u] 




p 


q 





i=l 



< max 



N 



i=l 



i=n-\-l 



N 



. i=l 



i=n-\-l 



i=l 
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□ 



Proposition 1. Fix the number of products N and let p be a CF algorithm. Then, for all M , 
r G {0,1/M,...,(M-1)/M}, y ^ ^A^xCi-r)*/^ Z G 5^^''*^ and an, 



dT%p,u,Y,Z) < \l]^dl-{p,v,Y,Z). 



Proof. Recall that x^^^^k-iy and x^^^^^k-i^(Y,z) denote the expected ratings of product Vk with 
respect to PMFs Pyf,^x''-iy and P,yj,,a:'=-i,(y,z)> respectively. We have 



dT'{p,v,Y,Z) 




< 



< 



\ 




1 

2 llPi^fc.a;*-^!' ~ Puk,x''-^ ,{Y,Z)\\l 



\ 



1 " 
k=l 



'-d^^ip,iy,Y,Z), 



where the first inequality follows from Lemma [T] and the second inequality follows from Pinsker's 
inequality. □ 

Lemma 2. Consider a Bernoulli random variable X and discrete random variables Wi and W2- 
Let Xi and X2 be the maximum a posteriori estimates of X upon observing Wi andW2, respectively. 
That is, 

Xi = argmaxPr(X = x|H^i), 

xG{0,l} 

X2 = argmaxPr(X = x|W2)- 

a;e{0,l} 

Then, 



Pr (Xi = - Pr = X) < ^JE[{E[X\Wl]-E[X\W2]f]. 
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Proof. 



Pr (Xi = - Pr (±2 = 

= 'S^ FiiWi = wi) max Pi(X = xlWi = wi) — 'S^Pt:(W2 = W2) max Pi(X = x\W2 = W2) 
^ xe{o,i} ' ^ x6{o,i} 

= J] Pr(^i = lui, W2 = W2) (^maxPr(X = = wx) - maxPr(X = x\W2 = ^2)) 

W1,W2 

< Pr{Wi = wi,W2 = W2) \Pr{X = l\Wi = wi) - Pr{X = 1\W2 = W2)\ 



Wl,W2 



- / = wi, W2 = W2) {Pt{X = l\Wi = wi) - Pt{X = 1\W2 = W2)f 



■W1,W2 



= WE 



{E[X\Wi]-E[X\W2]y 



The first inequality follows from a simple arithmetic argument, and the second inequality follows 
from Jensen's inequality. □ 

Proposition 2. Fix the number of products N. Let p be a CF algorithm and S = {0, 1}. Then, 
for all M,re {0, l/M, ...,{M - l)/M}, Y G 5^x(i-r-)M^ ^ ^ ^TVxrM^ ^ 

dUp,'^,Y,Z)<dr%P,iy,Y,Z). 

Proof. Recall that ,j.fc-i y and x^^ ;^k-i^(Y^z) denote the binary predictions on product with 
respect to PMFs P,y^.^x''-'^,Y Puk,x''-^,{Y,z)^ respectively. We have 



dl{p,u,Y,Z) 
1 ™ 



k=l 



< 



< 



\ k=\ 



1 ^ r 

\ k=l 



dT'{p,u,Y,Z). 
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The first inequality follows from Jensen's inequality and the second inequality follows from Lemma 
El □ 



A. 2 Results for Linear Collaborative Filtering Algorithms 

Proposition [1] provides a distortion bound for linear CF algorithms. Lemmas [3] and H] help prove it. 

Lemma 3. Fix the number of products N and let p be a probabilistic CF algorithm. Then, for all 
M,re {0, 1/M, . . . , (M - 1)/M}, Y G 5^x(i-0Af^ z E 5^^''^^ and v G cjjv, 

d-^{p,v,Y,Z)<\D(i,f \\iFf^^^). 

Proof. We denote by ip^^'"'{-\x) the conditional PMF of Xn conditioned on x based on W. We 

iJ 

have 



dl^{p,u,Y,Z) 
1 " 



k=l 



1 " 



k=l 

N 



< 



k=l 



s 



The last equality follows from the chain rule of KL divergence. 



□ 



Lemma 4. Fix the number of products N and let p be a linear CF algorithm. Then, for all M , 
r e {0, 1/M, ...,{M - l)/M}, Y E Z G 5^^^*^ and v G o^, 



ivy II :,pAy,z)\ ^^^^ 



1 — r 
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Proof. For any x G 5 , since p is linear, we have 



(x) 



< 



Then, 



In 



1 — r 



□ 

Theorem [Tl Fix i/ie number of products N and let p be a linear CF algorithm. Then, for all M , 
r G {0, 1/M, . . . , (M - 1)/M}, y G 5Afx(l-r)Af^ ^ g ^JVxrM^ J, ^ 



n 1 — r 



Proof. From Lemmas [3] and HI we have 



P,y II lP'iY'Z)\ < lin^ 



n 1 — r 



□ 



A. 3 Results for Asymptotically Linear Collaborative Filtering Algorithms 

Theorem [2] provides a distortion bound for asymptoticahy hnear CF algorithms. Lemmas [5] and 
[6] help prove it. Theorem [3] states the relationship between consistent and asymptotically CF 
algorithms. Lemmas [71 El and [Sj help prove it. 

Lemma 5. Let {^m} o,nd {fm} be two sequences of random PMFs over a fixed finite sample space 
17. If for all e > 0, Pr {D \ \ i^m) > e) — *■ 0, then for all e > 0, 



Pr [^/z„ 



[UJ 



log 



> e 



0. 



37 



Proof. For e > and m, we denote by Ajn^^ the event that for all a; G $7, at least one of the 
following holds: |log (jLt^(a;)/i/^(a;))| < e and max {//^(a;), i/^(a;)} < e. We now prove that for 
all e > 0, Pr (^„^e) 1. To see this, for any given e > 0, we let 5^ be in (0,e(l — e~^)) and 
denote by -Bm,(5e the event that for all w G O, l/XmCf^) — Vm{'^)\ < ^t- We now show that -Bm,6e C 
^^^f. If \^jn{ijj) — i/^(a;)| < 5f:,\/(jO, then for any w' such that max{/x^(a;'), i/^(a;')} > e, we have 
v[im.{iJLjn{oj'),i'^{ijj')} > e — S^. This implies 



log 



— — "1 ^'-'s — r~A ' ^"^s — r-7T 

< ma^jlog f ^-^^^^ ; "r^"^^^' + l") ,log f "-(^^^ 7 ^-^^^^1 + 1^ 



log TTV'log — 



< log 



+ 1 < e> 



where the last inequality follows from our choice of Hence, B^^^^ C and for any e and a 
corresponding 5^, we have Pr(^^£) > Pr(S„,5j — 1, where convergence follows from Pinsker's 
inequality. 

For each m and each realization of [i^n and i^^, we let = {a; G : fJ-mi^) < ^^ml^*^)}) let 



Tm = X] /^m(w)log 



and for e > 0, denote by Cm,e the event that Ir^l < £• We now show that for any e > 0, ^^,7^ C C„^e 
where 7^ G (0, min{e/|il|, 1/e}] satisfies 7£|log7e| < e/|^7|- To see this, we first let ^^^^^ = {to G 
^ ■ \log{nm{uj)/um{uj))\ < je} and fi^^^^ = {u E n : max < j^}\nl^^^^. Note that 

^m,7e I"' ^m,7, = 0- If all uj E \log {iim{i^) /um{uj))\ < 7^ Or max {^^(a;), 1/^(0;)} < 7^, then 
.^^ U J7m,7, = ^- This implies 



log 



X /im(w) log 



< I^m,7j7e+ X] //m(w) |log^m(w)| 



< |i^m,7J7^ + l^m,-yj76|log7 
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The first inequality follows from the definitions of ^^^,7^ ^^'^ ^m- The second inequality follows from 
the definition of ^m,^^ ^^'^ ^^^^ 7e — V^- The third inequality follows from other constraints on 
7e. Hence, for any e > and a corresponding 7^ satisfying the aforesaid constraints, ^m,7e C Cm,e- 
Note that for all m and all realizations of /x^ and f^, 



to 



log 



2t„ 



Hence, for any e > 0, a corresponding 7^, and any m, 



log 



> 3e 



= Pr {D{nrn W^m)- > 3e, A^ ,^J + Pr {D{iJ.m 1 1 t'm) - 2r^ > 3e, A^,^^ ) 

< Pr (A^^^J + Pr (L>(/x^ |km) - 2r^ > 3e, C^,^) 

< Pr (^^_^J + Pr I km) > e) ^ 0. 

Here, ^^^j^-y^ denotes the complement of Ajn,-y^- The last inequality follows from the definition of 
Cm,e- Convergence follows from our original assumption and that Pr(A^^-yJ 1. □ 

Lemma 6. Let {fJ-m}, {^m}, 0,'nd {xm} be three sequences of random PMFs over a fixed finite 
sample space Q. Suppose that for all e > 0, Pr (D (/x^ || i^m) > e) — ^ 0. Further, suppose there 
exists b > such that for all m, realizations of Xm o-nd Hm, and u E CI, Xm{^) / fJ-mitij) < b. Then, 
for all e > 0, Pr(|D(xm || l^m) - D (xm \\ i^m)\ >e)^0. 

Proof. For any e > and m, 



= Pr 



) - D {Xm II t'm)! > e) 



< Pr 



Vxm(w)l0g 



/^m(<^)log 



> e 



> e 



< Pr ( 6 X 



^^(a;)log 



> e 
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where convergence follows from Lemma O □ 

Theorem [2]. Fix the number of products N and a set of joint PMFs over x . Let p be a 
CF algorithm asymptotically linear with respect to ^. Then, for all /i*,vr* G r G [0, 1), v £ ctm, 
n G Zj^, and e > 0, 

lim Pr (dl^{p, V, Ym, Zra) > - In + e ) = 0, 

m^oo y n 1 — r J 

where, for each m, Ym = {y^, . . . , e gNxim-l) ^ _ (2-^, . . . , z') G 5^^', I Binomial{m, r), 

and y^, . . . , ~ and , . . . , ~ vr J are i. i. d. sequences. 

Proof. Let finite sample space Q. = S . For each m, let random PMFs = 

J^m = ■0^'^^™'^'"''^ and Xm = V'^'^'"- By the definition of asymptotically linear CF algorithms, for 
all e > 0, Pr (Z) (//^ || ^m) > e) ^ 0. Clearly, for all m, realizations of Xm and ^m, and s G S"^, 



X'm{s)/Hm{s) < 1/(1 - r). Hence, for all e > 0, 



Pr ( d'^^ip, V, Ym, Z.m) > - In + e 

n 1 — r 



* * iS J \ — f 



1 

< Pr (l> (v^I'''"' II Vif ^''™'^'"^) > D (^1'^™ II (1 - r)Vi|'^'" + r^l'^") + ne) 
) - D{xm II t'm)| > ne) 0. 

The second inequality holds because D ^i^^'^™ II (1 — + ^'V'^-''^'") < ln(l/(l — r)) and con- 

vergence follows from Lemma [6l □ 

Lemma 7. Let U C be a compact set. Consider a fixed vector u £ U and a sequence of 
random vectors {um} for which Pr (um G ?7) ^ 1. For any continuous function f : U ^ ?R., if 
Pr (||n^ - n||i > e) ^ for all e > 0, then Pr {\f{um) - f{u)\ > e) ^ for all e > 0. 

Proof. Because the continuous function / defined on compact set U is uniformly continuous, for 
each e > 0, there exists 5 > such that for any w, f' G U, ||f— f'||i < (5 implies that \ f{v) — f{v')\ < e. 
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Hence, for all e > 0, 



Pr(|/(n^)-/(n)| <e) 

> Pr(|/(n^)-/(n)| <e,u„GC/) 

> Pr {\\um - u\\i < 5, Um & U) ^ 1. 



□ 



Lemma 8. Fix a finite sample space Q. Let {nm} o-nd {vm} be two sequences of random PMFs over 
and let ^ he a fixed PMF over . If for alle > 0, Pr (Z) {firn 1 1 ^) ^ e) ^ and Pr (D (f^ 1 1 fi) > e) 
0, then for all e > 0, Pr {D (/i^ 1 1 t'm) > e) ^ 0. 

Proof. We first identify the support of fi. Without loss of generality, we let = 0,V1 < i < Z 

and > 6,y I < i < \Q\ for some / > and 6 > 0. In the following, we represent PMFs as vectors 

m sRl^l. To this end, we define a set T = { (ti, . . . , t|f^|) : J2i ti = l.ti = 0,^ i < l.ti > 5,y i > I}. 
Let compact set U = T x T. We let u = {fj., /j,) £ U and define a sequence of random vectors 
{um} where each Um = (Mmji^m)- Let continuous function /:[/—> 3f? be the KL divergence 
D{- II •). By examining the absolute continuity of fim and with respect to fi and applying 
Pinsker's inequality, we have Pr(nm G f7) — > 1 and further, for all e > 0, Pr(||um — ^i||i > e) ^ 0. 
Hence, for all e > 0, by Lemma [71 Pr (Z) (/i^ || i^m) > e) = P^ {\D {^m \\ i^m) — D {fJ- II /^)| > e) = 
Pr(|/(n„)-/(^x)| >e)^0. □ 

Lemma 9. Fix a finite sample space fl. Let {fim} oind {fm} he two sequences of random PMFs 
over and let fi and v he two fixed PMFs over (7. If for all e > 0, Pr [D {fj,m || ^) > e) — > and 
Pr {D {vm 1 1 i^) > e) ^ 0, then for all r G [0, 1] and e > 0, 

Pr {D (((1 - r)/i„ + rv^) 1 1 ((1 - r)^ + rv)) > e) ^ 0. 

Proof. The proof here is similar to that for Lemma [H For a fixed r, let PMF x = (1 — 't)^ + rv. 
Without loss of generality, we let x{^i) = 0,V1 < i < / and xi'^i) > 5, < « < |ri| for some 
/ > and 5 > In the following, we represent PMFs as vectors in JRI^L To this end, we define 



41 



a set T = { (ti, . . . , t|f7|) : J2i = 1- = 0, Vi < l.ti > 5, Vi > /}. Let compact set ?7 = T x T. 
We let u = {{1 — r)^ + ri/, (1 — r)^ + riy) G U and define a sequence of random vectors {um} 
where each Um = ((1 — r)ij,m + rvm, (1 — r)fj, + rv). Let continuous function / : [7 — > 3? be the KL 
divergence D(- \ \ ■). By examining the absolute continuity of fim with respect to and that of Vm 
with respect to u and applying Pinsker's inequality, we have Fr{um G f/) — > 1, and for all e > 0, 
Pr(||um — u\\i > e) ^ 0. Hence, for all e > 0, by Lemma[71 

Pr {D (((1 - r)/i^ + ri/„) || ((1 - r)/x + ru)) > e) 
= Fr{\D{{{l-r)firn + ru^) \\ {{1 - r)fi + ru)) - D {{{1 - r)fi + ru) \\ {{1 - r) fi + ru))\ > e) 
= Pr(|/(n„)-/(tx)| >e)^0. 

□ 

Theorem [Si ^ny probabilistic CF algorithm consistent with respect to an identifiable and convex 
set \E' is asymptotically linear with respect to ^ . 

Proof. We use the notation in the definition of asymptotically linear CF algorithms in Section 14.31 
and Lemmas [8] and [9l Fix an algorithm p consistent with respect to ^. Let finite sample space 
= 5^. For each m, let random PMFs = V'l^™, Vm = ^1^™, and Xm = Let 
/i = V = (f>^. Fix r G [0, 1]. By the consistency and convexity of ^, we have for all e > 0, 
Pr(Z?(^„ II fi) > e) ^0, Pr{D{iym \\ > e) ^ 0, and Pr (i? || {{I - r)^x + rv)) > e) ^ 0. By 
LemmalU for all e > 0, Pr (((1 — r)^rn + ri^m) \ \ ((1 — ?')/^ + i''^) > e) — > 0. Then for all e > 0, 

Pr (Z) (((1 - r)^|^- + rV^I^™) || ^|'(^™'^-)) > = Pr {D (((1 - r)fi^ + ru^) \\ Xm) > e) ^ 0, 

where convergence follows from Lemma [H □ 

A. 4 Results for Kernel Density Estimation and Naive Bayes Algorithms 

Propositions in this section pertain to KDE and NB algorithms. 
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Proposition 3. Any KDE algorithm is a linear CF algorithm. Any linear CF algorithm is a KDE 
algorithm. 

Proof. Fix a KDE algorithm p. Given Wi g ^^x^^i and € S^^^^^ for each x eS^ , 



Ml + M2 Ml ^ "'^ M ■ Ml + M2 1 M2 

tuevvi / \ W&W2 



M1 + M2 ^ ^ M1 + M2 s 

Hence, p is linear. 



s 

KDE algorithm where kernel fCw is set equal to V'^'^"'^ for each it; G is equivalent to p'. □ 



To show the converse, consider a linear CF algorithm p' that generates tp^' based on W. A 



Proposition 4. Fix q G [0,1). Xei ^''^ be the set of all joint PMFs ip over x for which 
there exists L G Z^, r) G Tl, and Oi^n € Tm, for all 1 < I < L,l < n < N such that for each 



{w, w) G X where w w, 



L N N 



\l=l n=l / 

Then, ^''^ is identifiable and convex. Further, the naive Bayes algorithm is consistent with respect 
to 

Proof. To show that is identifiable, consider ^p, ■0' G for which il^s = V'is- We then have for 
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each {w, w) S x S where w ^ w, 





■^(1 






= gll- 








= gll- 








= qW"" 


^(1 







= tp'{w,w). 



Hence, ip'^ is identifiable. 

To show that is convex, consider arbitrary PMFs ip, ip' G For each A G [0,1], their 
convex combination ip^ satisfies 

ip^{w, w) = \ip{w, w) + {1 — X)'ip'{w, w) 

(L N V N \ 

1=1 71=1 1=1 n=l / 

/L+L' N \ 

= glHI^(l-g)^-IHI. ^^^JJ^(J) 

\ Z=l n=l / 



for each (itJ, w) G 5"^ x 5^ that w ^ w, where f/; = Xrji for I < L, fji = {1 — X)r}' i_i for ^ > L, and 
for each n, „ = 0; „ for / < L and Oi^n = (^l-L,n for / > L. Hence, ^/^•^ G ^''^ is convex. 

We now show that the naive Bayes algorithm p is consistent with respect to by using results 



m 



381 ] . We will use notation in the definition of consistent CF algorithms in Section 14.31 We also 



denote by -0^ the true PMF over 5^ and let ^'^ = : tp G ^'^} be the set of all marginals over 



of PMFs in According to Theorem 2 in 3^, if 

1. , ^-g, and satisfy certain technical conditions specified in [38[], and 

2. There exists a constant 6 > such that for all m and all realizations of {Wm}; 
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> 6, 



then 



l^f^^-rsWi^^.a.s. (4) 



We verify that condition[T]holds in our problem instance and desire to find a b that satisfies condition 



[2j To do so, we denote by {L* ,r]* , 6*) the parameters corresponding to and by (L '"j f) 9 



over a range T. Let r* be the value that we settle at. We let 



the parameters corresponding to ip^ Recall that we tune the parameter r by cross validation 



Because ij)^^"" maximizes the posterior PDF, for all m and all realizations of {Wmji we have 

Hence, condition [2] holds, establishing By the continuity of KL divergence and properties of 
almost sure convergence, for all e > 0, we have 



Pr(D (^1^™ ll^^j >ej ^0, 
which implies that p is consistent with respect to □ 



B Table of Notation 
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Notation 


Description 


s 


Set of possible rating values. 


s 


Union of S and the singleton set containing the question mark. 


N 


Number of products. 


Y 


Set of ratings vectors provided by honest users. 


z 


Set of ratings vectors provided by manipulators. 


W 


Training set consisting of all ratings vectors. 


r 


Fraction of ratings vectors generated by manipulators. 


M 


Number of ratings vectors in the training set W . 


n 


Number of ratings provided by an active user. 




Ratings vector in that contains k ratings provided by an active user. 




The index of a fcth inspected product by an active user. 




PMF generated on the rating for product vu, for a user with history x'^"^, based on training set W . 


P 


A CF algorithm. 


(TJV 


Set of permutations of {1, . . . , A'^}. 




KL distortion. 


^RMS 


RMS distortion. 




Binary prediction distortion. 




Scalar prediction of rating of product Vk for a user with history x'°~^ , based on training set W . 




Binary prediction of rating of product for a user with history x^~^ ^ based on training set VK. 




mth user type in W. 




mth ratings vector in W . 


Wn or Xn 


Rating of an nth product based on a user type. 


Wn or Xn 


Observed rating of an nth product. 


W ^ W 


For each n, either Wn = Wn or Wn =?■ 


? 


|{n : Wn =?}|. 




Jomt PMF over S x S^^ . 




— N 

Marginal PMF over S . 


WS 


Marginal FiVir over o . 




Joint PMF over S x of honest users. 


TT 


Joint PMF or D x S or manipulators. 




Set of joint PMFs over x of interest. 




— TV 

Set of marginal PMFs over S of interest. 


/ Pi 


PMF for X generated by algorithm p based on training set W. 


rf{.\x) 


Conditional PMF for x conditioned on ratings vector x G S''^ , generated by algorithm p based on 




training set W. 




Simplex {(ii, ...,tk):tj>Oyi<j<k. J2']^^ tj = 1}. 


Jrms 


Empirical RMS distortion. 


^RMS 


Empirical RMS prediction error. 


X 


Set of ratings vectors provided by active users. 


V 


Validation set of ratings vectors. 




The index of a A;th inspected product by an active user with ratings vector x. 



Table 1: Table of notation. 
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